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ABSTRACT 

Textual Analysis o£ Language Samples (TEXAN) is a 
computer program which can count a number of variables needed for use 
in readability formulas. .Three studies which utilize TEXAN are 
reported in this paper: (1) In 1972, Norman and Helen F.elsenthal 
randomly selected 20 books from the 1306 in Eakin's "Good Books for 
Children^^ and calculated their internal consistency using Gunning, 
Spache, and two Flesch formulas. .The results disprove the speculation 
that difficulty increases from beginning to end in many children's 
books.. The study also compared the four readability estimates; 
correlations varied from high to negative. (2) In a 1973 study, Alden 
J. -Moe investigated the readability of selected Newbery Award Books. . 
Fry, Gunning, and Lorge formulas were used but did not provide the 
same gre^.de level estimates when applied to a single sample. , {3) Also 
in 1973, Norman Felsenthal analyzed the readability and specialized! 
vocabulary of nine ^selected O. 8. -history texts in grades 5, 8, and 
11 using Flesch, Fry, ctnd Lorge formulas. Results indicated the three 
f jfth'-grade books were in excess of their intended level of usage^ 
Thj others were closer to their intended levels but a jiide variation, 
in scores among the three formulas existed. .Other studies rising 
computer programs are also reviewed. . (TO) 
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Ifetching child \ath book in terms of interest and readability 
level has been a long-time problem of the classroom teacher. Interest 
level and readability are closely related because the desire to read is 
invariably reduced rfhen material is too difficxilt. 

Although much research has centered on readability, the classroom 
teacher still has very little information concerning the readability of 
materials, especially trade books. Reference periodicals usually give 
a gross estimate such as "for use in intermediate grades, or for use in 
grades 3-5". Consequently, the teacher is often left i/ith the task of 
determining a more precise difficulty level. 

The main reason for this limited information is not the lack of 
formulae to identify readability but rather the tedious and time consximing 



§ work which is necessary to collect the data needed to calculate 

readability. 
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This paper describes a computer program irhich greatly reduces the 

time involved in the calculation of readability and also revie^rs several 

studies which have utilized computer programs to determine readability. 

Although teachers seldom have access to computers, the technique described 

can be used by publishing companies and/or curricula specialists and the 

information made available to teachers. 
4 

The term 'readability* has been defined and interpreted many vays. 
Edgar Dale and Jeanne Chall, t\7o or the best IcnoT/n researchers in this 
speciality, state: "In the broadest sense, readability is the sum total 
^ (including interactions) of all those elements 'ifithin a ^-iven piece of 

printed Toaterial that affects the success vhich a group of readers have 
\Tith it. The success is the extent to which they understand it^ read 
it at optimum speed and find it interesting (19^8, p. 38)." 

The matching of interest and readability is emphasized in Gilliland's 
statement : 

On the one hand there is a collection of individuals Trith 
given interests and reading skills. On the other hand, there 
is a range of books and other reading materials, differing 
^■Tidely in content, style and complexity. The extent to tzhich 
books can be read tdth profit Tn.ll be determined largely by 
the vay in which the tiio sides are 'uatched. For example, a 
person T?ho is a competent reader may soon be deterred from 
reading if her choice is restricted to simple, repetitive texts. 
Similarly, a person lath limited reading ability may soon become 
discouraged if he is given texts T7hich are beyond his ccxapre- 
hension (1972, p. 12). 

A concise and inclusive definition is offered by Lamb (1973)- 
"Readability is the sum of factors, and the interactive effect of these 
factors, which may be greater than the suifl, affecting an individual's 
ability to comprehend what is read. Factors typically considered in 
readability are number of words in a sentence, number of syllables in 
^ words, and frequently, an analytical comparison of the words in a selection 

ERIC vrith those included on a standardized list of some type" (p. 2). 
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Interest in readability dates as far back as the HcGuffoy readers 
where an attempt was -nade to grade materials in terms of difficulty level. 
Lamb (1973) > in her recent revieir of literature concerninj rea^iability, 
notes that Thorndike's Teachers Word Book: (1921) was one of t,ne first 
efforts to objectively measure the difficuloy level of reading :naterials. 
Virtually all of the early atteinpts to analyze readability relied upon 
vocabulary variables as the main factors ±n determining difficulty 
level. Bet^rcen I93U and I938 readability research efforts were broadened 
and more than vocabulary factors were considered. Reliance on word 
lists compiled by Thorndike and others diminished and attention was 
given to factors such as sentence length and syntactical construction 
such as parts of speech. Efforts to malf.e readability forimalas more 
efficient by reducing the number of variables ia the formulas was noted 
during the 19U0's and 1950* s (Dale and Chall, 19-^8; Flesch, 19^*8; Lorje, 
1939; Yoakum, 1951)- 

Letter redundancy ( Carterette and Jones, 1963)^ and independent 
clause frequency (Strickland, I962) are two additional variables included 
in the more recent studies. The Cloze procediire, a patterned deletion 
of words from passages, has also been utilized irlth considerable interest 
by researchers in readability (Eormuth, 19^3 > 19^6: KLare^ 19^3; 
Ramanauskas, 1972^ Ranlcin and Cxilhane, I969; Taylor, 1953)- 

The attempt t:o automate readability measiirement is of more recent 
origin and has, for the most part, utilized computer programming. One 
notable exception is the Readability Index Tabulator which is attached 
to an electric typeinriter to collect readability data. The attachment, 
developed by Smith and Kincaid (1970) tabulates the number of strokes 
(letters), the number of words, and the number of sentences. Information 
from the tabulator is then utilized using a computer to determine 
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readability (Kincaid et. al, 1972). 

Computer usage eliminates the tedious tasks of counti-v^ words, 
syllables, sentences, and other variables \7bich are needed for use in 
readability formulas. One such program, TEXAN (Textual Analysis of 
Language Samples) has been developed at Purdue University. This program 
counts a number of variables including: total words (any combination of 
alphanumeric characters delineated by spaces), total non-exempt ;rords 
(those words not included in a special listing such as the Dale list of 
769 easy words), different non-exempt words, special \rovds (those xrords, 
up to 100, ^rhich the programmer designates; e.g., a list of pronouns or 
other \iords with special significance), statements, questions, exclamation 
total sentences, quotations, words per sentence^ non-exempt words per 
sentence, characters per Trard, characters per non-exenpt irord, and average 
occurrences of each non-exempt word (listed either alphabetically, by 
frequency, or by first occurrence as determined by the sub-routine 
requested by the user). 

One key element of many readability formulas which can not be 
obtained directly by the TEXAN program is syllable count. This can be 
accurately estimated, however, by dividing the number of letters in any 
message by 3»1127» This constant: is derived from data analyzed in a 
research study T/here forty language samples i;ere studied and correlations 
were run between a man-made syllable count and both letter count and 
vo\7el count. Number of characters and nuinber of syllables correlated 
at .98 ^ri.th a mean ratio of characters to syllables of 3»1127. Number 
of vowels and number of. syllables correlated at .S6 \n.th a mean ratio 
of vowels to syllables of I.76I (Felsenthal, Shamo, Bittner, 1971). 
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Three studies ^rhich have utilized the TEXAN are reported belovr. 
Felsenthal and Felsenthal (1972) calculat'^d readability internal con- 
sistency of 20 books randoinly selected from among the I306 "books 
included in Eakin*s (I962) Good 3o oks for C hildr en. Each book yielded 
three language samples of approximately 200-300 words each from the 
first third of the book^ the middle portion, and the final portion. 
The sixty language samples uere key punched and processed utilizing 
the TEXAN program. Measurements were transferred to work sheets and 
a calculator was used to ascertain the readability for each passage 
using four different readability formulas: Gunning *s Fog Index (Gixnning, 
1952), Spache (Spache, 1957), Flesch Reading Ease (Flesch, 1963) and 
Flesch's Human Interest (Flesch, 1963)* Readability indices for the 
three samples from each book were compared and Chi-square procedures 
were employed to determine if readability variations TriLthin each book 
were excessive. ITono of the eighty Chi-squares (four indices x fe/Gnty 
books) was significant. Internal consistency of the twenty books as a 
single sample vras measured in the second analysis. One-way analyses 
of variance vere performed for each of four readability indices. None 
of the four indicated significant differences between the firsts second, 
and third portions of the books. This study verified the internal con- 
sistency of these books which disproves the speculation by some educators 
that books become harder as they progress from beginning to end. 

A secondary purpose of this study Mas to compare the readability 
estimates of the four readability indices utilized. The correlations 
indicated a relatively high correlation between the Gunning and Spache 
formulas, a moderately low correlation between the two Flesch formulas, 
and a negative correlation between the two Flesch formulas and the 
Gunning and Spache grade level indicators. This negative correlation 
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was .anticipated since higher scores on the tvro Flesch indices mean 
greater ease in reading as opposed to grade level indices which increase 
in value as they increase in difficxilty. 

Ill another study, Moe (1973) investigated th3 readability of 
selected Ne\7bery Award Books and developed a "^rord list of 200 high 
frequency words conomon to all books analyzed. Five 100 word samples 
irere taken from each of three parts of the books; the middle of the 
first chapter, the middle of the middle chapter, and the middle of the 
last chapter. Readability was estimated using three formulas: Fry (I968), 
Gunning (1952), and Lorge (1959)- Results indicated that the three 
readability formulas usually did not provide the same grade level 
estimates when applied to a single sample; however ^ the Lorge and the 
Fry provided similar resiiLts. The Gunning generally rated samples as 
being more difficult than either the Lorga or Fry estimates. In general^ 
the sample of Newbery Award Books which were analyzed in this study 
had readability levels primarily in the fifth through seventh grade 
levels although the range was from second ^rade through ninth grade. 

The 200 high frequency words were identified by analyzing all 75 
language samples (three passages from each of 25 books). The particular 
T./ords can be found in the research reported by Moe (1973) 5 however , the 
first eleven words were the same as those identified in an earlier word 
study of primary-grade trade books (Moe^ 1972). These words vrere: the, 
and , a, to, he, of, in, H.^s, his , it , and I. 

Another research study (Felsenthal, 1973) analyzed the readability 
and specialized vocabulary of selected U.S. history texts in grades 
five, eight, and eleven, and made comparisons between materials designed 
for each of these three grade levels. Data from a fourth or "news 
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periodical" level (Time, U^, News and Vforld Report ^ and Newsueek) ^^as 
also examined and described. 

Nine U.S. history textbooks currently in heavy use throughout 
the country were selected to provide data for the study. Three of 
these texts were specifically witten for and are vrLdely used in 
fifth grade classes; three other texts are used in eighth grade and 
the remaining three in eleventh grade. Five 200 to 3OO word language 
samples were randomly selected from each of the nine books for a total 
of forty- five language samples. An additional fifteen samples were 
dra^m from news magazines; five each from the inagazines previously 
mentioned. The total sample consisted of sixty separate selections 
and total data exceeded 1^,000 vrords. The sixty language samples 
were key-punched and processed utilizing the TEXAN program. 

The author identified readability through the use of three 
formulas: Plesch (1963)> Fry (I968), and Lorge (1959)- Resxilts 
indicated that all three fifth grade textbooks yielded readability 
scores in excess of their intended levels of usage (i.e. from one and 
a half to two years higher than the designated fifth grade level). 
Readability scores for the eighth and eleventh grade books were closer 
to their intended level however ^ although there vras vdde variation 
among the three formulas. 

Readability scores for the news periodicals seemed closely related 
to those of the eleventh grade texts. The greatest factor of change 
across tne fifth, eighth, and ele^* '"h grade levels was in the number 
and percentage of large vrords. News periodicals, hov/ever, used fewer 
large words than did eleventh grade texts. The one factor that shovred 
a consistent rate of increase across cll four levels was sentence 
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length ^/ith fifth, eighth, eleventh, and the nevs periodical selections 
ruiining approximately 12 ^ 1^, l8, and 22 vords per sentence respectively. 

Another intent of the research, the development of a hierarchy of 
specialized words related to the study of U.S. history and current 
events, was not realized. A special count of those words used four or 
more times and not in the Dale list of iSS easy words yielded a paucity 
of social studies vrords. In the fifth s^ade sOiiiple only "slavery" could 
be identified as a word unique to. social studies, ".Armistice" was the 
only unique %'7ord found in the eighth jrade samples. In the eleventh 
grade sample only "federal", "political"^ "representative", and 
"Republican" could be labeled "social studies" words. Even more sur- 
prising was the absence of frequently used social words in the ne\7S 
periodicals. Only six words were unique ("American", "history", "U.S.", 
"Johnson(s)", "President (ial)", and "Vietnam"), and the latter three 
T/ere a function of the particular date and time of the sample. 

As stated earlier all three of the studies previously cited 
utilized the TEXAN computer program to ide. the variables needed 
for readability identification. In each case the TEXAJI program generated 
much more data than was actually used by the researchers. Consequently, 
additional analyses may be performed at a later date. 

In summary, conclusions from the three studies using TEXAN reveal: 

1) Trade books tend to be internally consistent in terms of 
readability. Classroom teachers need not be concerned 
that the reading level becomes more difficult, or changes 
at all, as the student progresses through a book. 

2) Selected social studies texts used for U.S. history in 
grades 5, 8, 11 tend to have a greater reading difficulty 
level than the assigned grade. This is particularly true 
of fifth grade texts where the readability level was la-2 
years higher than the grade. Therefore the readers of 
these textbooks should be at least average or preferably 
above average readers. Corrective and ren,edial readers 
cannot be expected to use these books. 
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3) It is both difficult to ascertain and mmecessari' to 

teach a particular social studiss vocabiilary since those 
words utilized in social studies texts are virtually 
identical to words employed in standard reading, 

k) News periodicals such as Time, ife}/s and Vtorld Report 

and Newsweek can be used as supplementary reading at the 
eleventh grade level since the readability of these maga- 
zines approximates that of the eleventh jrade social 
studies texts. 

5) The readability level of such important • books as the 
Nevrbery A\rard and other popular books should be more 
precisely identified for readability. In the past most 
Newbery A\r8ird books have readability levels bet'ireen fifth 
and seventh grade. 

6) A list of high frequency words (such as the Iblch) can 

be detemined for various language samples. It is important 
that just eleven words (the> and, a^ to^ he , of, in? vas, 
his , it , and l)^ appear to be used axtensively in almost all 
language samples.* 

Some implications can also be dra\m from the three compu^ ^r- 
assisted stUuxes: 

1) Although there are many readability formulas currently in 

use, few hold their valvie across a broad range of difficulties. 
The Spache formula is best limited to primary usage, Lorge 
seems appropriate for junior high, and Fry for high school. 
The Flesch Reading Ease Index seems to have the broadest 
range of the formulas examined in the three studies. 

2) More readability checks should be made on texts used in 
the content areas such as social studies and science. The 
actual difficulty of content subject texts may be quite a 
bit higher than the designated grade level. 

3) Since content texts are often authored by more than one 
person^ the consistency of readability throughout the text 
should be examined. This process does not seem to be 
necessary for single-authored trade books, hoi/ever. 

It is indeed feasible to utilize a computer program to measure 

stylistic variables and calculate readability levels. This automation 

of a previous tedious task promises to offer the classroom teacher much 

more information concerning the readability of various materials. 
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